GE LDA Survival
GE LDA Survival introduces a novel methodology for improving survival prediction in cancer patients. This approach leverages high-dimensional gene expression data through innovative topic modeling techniques commonly used in the natural language processing domain. This approach addresses the challenge of dealing with the vast dimensionality of genomic data, which standard survival prediction models struggle to process effectively.
Mimicking how documents are represented as mixtures of topics in natural language processing, GE LDA Survival represents each patient as a mixture of cancer topics, each a mixture of gene expression values. The standard Latent Dirichlet Allocation (LDA) model was extended to adapt to the real-valued nature of gene expression data, resulting in a discretized LDA (dLDA) procedure. This adaptation allows for deriving expressive features from gene expression data, capturing the heterogeneity of a patient's cancer.
After employing dLDA to learn cancer topics, patients are expressed as distributions over a limited number of cancer topics. This low-dimensional representation, termed "distribution vector," is then utilized as input for a survival prediction algorithm, specifically the Multi-Task Logistic Regression (MTLR). The methodology was initially applied to the METABRIC dataset, involving 1,981 breast cancer patients characterized by 49,576 gene expression values from microarrays, and subsequently validated on the Pan-kidney (KIPAN) dataset with 883 patients and 15,529 gene expression values, demonstrating its applicability across different cancer types and gene expression modalities.
The results from both datasets indicate that the dLDA followed by the MTLR approach yields survival estimates that surpass standard models in accuracy, as measured by the Concordance index. Furthermore, the models were found to be well-calibrated using the "D-calibrated" measure, emphasizing the reliability of the predictions made by this innovative approach.
Topic
Oncology;Microarray experiment;Gene expression;Machine learning;Surgery
Detail
Operation: Essential dynamics;Expression data visualisation;Gene prediction
Software interface: Command-line interface
Language: C
License: GNU Lesser General Public License, version 2.1
Cost: -
Version name: -
Credit: Alberta Machine Intelligence Institute (Amii) and NSERC.
Input: -
Output: -
Contact: Russell Greiner rgreiner@ualberta.ca
Collection: -
Maturity: -
Publications
- Gene expression based survival prediction for cancer patients-A topic modeling approach.
- Kumar L and Greiner R. Gene expression based survival prediction for cancer patients-A topic modeling approach. Gene expression based survival prediction for cancer patients-A topic modeling approach. 2019; 14:e0224446. doi: 10.1371/journal.pone.0224446
- https://doi.org/10.1371/JOURNAL.PONE.0224446
- PMID: 31730620
- PMC: PMC6857918
Download and documentation
Documentation: http://pssp.srv.ualberta.ca/home/about
< Back to DB search